SARIMA (Seasonal ARIMA): seasonality handling, differencing, and forecasting#

SARIMA extends ARIMA to explicitly model seasonal patterns (weekly, monthly, quarterly, …) by adding a second ARIMA component that operates at the seasonal lag s.

This notebook focuses on:

  • Precise seasonality handling in SARIMA

  • Seasonal vs non-seasonal components

  • When SARIMA is required (vs plain ARIMA)

  • Seasonal differencing and the tuple (p,d,q)(P,D,Q,s)

  • Intuition via decomposition-style plots and multi-season forecasts

  • A low-level NumPy implementation of the mechanics (differencing + lag polynomials + recursive forecasting)

Prerequisites#

  • AR / MA / ARMA basics

  • Stationarity and differencing

  • Lag operator notation (optional but helpful)

1) What does “seasonality” mean, and how does SARIMA handle it?#

A time series has seasonality when its behavior repeats with a fixed period s:

  • Deterministic seasonality: a stable repeating pattern in the mean (e.g., “December is always higher”).

  • Stochastic seasonality: seasonal effects are correlated across cycles (e.g., “this December depends on last December”).

SARIMA is designed to handle stochastic seasonality by combining:

  1. Seasonal differencing D (lag s): removes seasonal unit roots / repeating seasonal levels.

  2. Seasonal AR terms P: correlation at lags s, 2s, …, Ps.

  3. Seasonal MA terms Q: how seasonal shocks propagate to lags s, 2s, …, Qs.

If seasonality is purely deterministic, alternatives can be simpler:

  • seasonal dummies (month-of-year indicators)

  • Fourier terms (smooth seasonality)

  • decomposition + ARIMA on the remainder

But when residuals still show seasonal autocorrelation, SARIMA is often the right tool.

2) Seasonal vs non-seasonal components (and the full SARIMA equation)#

Let L be the lag operator: L^k y_t = y_{t-k}.

Non-seasonal differencing (order d):

\[\nabla y_t = (1 - L) y_t = y_t - y_{t-1}, \qquad \nabla^d = (1-L)^d\]

Seasonal differencing (order D, season length s):

\[\nabla_s y_t = (1 - L^s) y_t = y_t - y_{t-s}, \qquad \nabla_s^D = (1-L^s)^D\]

Define the stationary transformed series:

\[w_t = \nabla^d \nabla_s^D y_t\]

A SARIMA(p,d,q)(P,D,Q,s) model is:

\[\Phi_P(L^s)\,\phi_p(L)\,w_t = c + \Theta_Q(L^s)\,\theta_q(L)\,\varepsilon_t\]

with (one common sign convention):

\[\phi_p(L) = 1 - \phi_1 L - \cdots - \phi_p L^p\]
\[\theta_q(L) = 1 + \theta_1 L + \cdots + \theta_q L^q\]
\[\Phi_P(L^s) = 1 - \Phi_1 L^s - \cdots - \Phi_P L^{Ps}\]
\[\Theta_Q(L^s) = 1 + \Theta_1 L^s + \cdots + \Theta_Q L^{Qs}\]

Interpretation of the tuple:

  • (p,d,q) controls short-lag (non-seasonal) dynamics and differencing.

  • (P,D,Q,s) controls seasonal-lag dynamics and seasonal differencing.

  • s is the number of observations per season (e.g., 12 for monthly data with yearly seasonality).

Key intuition: SARIMA is multiplicative. The combined AR lag polynomial is the product \phi_p(L)\Phi_P(L^s), which introduces cross-lags like L^{s+1}.

import numpy as np
import pandas as pd

import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

rng = np.random.default_rng(0)
# --- Synthetic monthly data with clear seasonality ---
# We purposefully include:
# - deterministic seasonality (sin/cos)
# - stochastic seasonal dependence (AR at lag s)
# - a mild trend (to motivate non-seasonal differencing)

season_length = 12
n_years = 10
n_obs = n_years * season_length

t = np.arange(n_obs)
dates = pd.date_range("2010-01-01", periods=n_obs, freq="MS")

det_seasonal = 8.0 * np.sin(2 * np.pi * t / season_length) + 3.0 * np.cos(2 * np.pi * t / season_length)
trend = 0.10 * t

seasonal_noise = np.zeros(n_obs)
short_noise = np.zeros(n_obs)
eps_seasonal = rng.normal(0.0, 0.8, size=n_obs)
eps_short = rng.normal(0.0, 1.0, size=n_obs)

for i in range(n_obs):
    short_noise[i] = (0.6 * short_noise[i - 1] if i > 0 else 0.0) + eps_short[i]
    seasonal_noise[i] = (0.7 * seasonal_noise[i - season_length] if i >= season_length else 0.0) + eps_seasonal[i]

y = 50.0 + trend + det_seasonal + seasonal_noise + short_noise
series = pd.Series(y, index=dates, name="y")

series.head(), series.tail()
(2010-01-01    53.888172
 2010-02-01    57.909024
 2010-03-01    60.006114
 2010-04-01    57.476490
 2010-05-01    54.720164
 Freq: MS, Name: y, dtype: float64,
 2019-08-01    53.911005
 2019-09-01    52.239645
 2019-10-01    50.273459
 2019-11-01    51.970637
 2019-12-01    58.234690
 Freq: MS, Name: y, dtype: float64)
fig = go.Figure()
fig.add_trace(go.Scatter(x=series.index, y=series.values, mode="lines", name="y"))
fig.update_layout(
    title="Synthetic monthly series with seasonality",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white",
    height=420,
)
fig.show()

3) Seasonal effects (visual intuition)#

Two complementary views are useful:

  • Within-season distribution: how does the typical value differ by month-of-year?

  • Seasonal decomposition intuition: treat the series as trend + seasonal + residual (an intuitive lens, not the SARIMA model itself).

Additive decomposition intuition:

\[y_t \approx T_t + S_t + R_t, \qquad S_{t+s} = S_t\]
df = series.to_frame()
df["month_num"] = df.index.month
df["month"] = df["month_num"].map(lambda m: pd.Timestamp(2000, m, 1).strftime("%b"))

month_order = [pd.Timestamp(2000, m, 1).strftime("%b") for m in range(1, 13)]

fig = px.box(
    df,
    x="month",
    y="y",
    category_orders={"month": month_order},
    points="all",
    title="Seasonal effect view: distribution by month-of-year",
)
fig.update_layout(template="plotly_white", height=420)
fig.show()
decomp = seasonal_decompose(series, model="additive", period=season_length)

fig = make_subplots(
    rows=4,
    cols=1,
    shared_xaxes=True,
    vertical_spacing=0.03,
    subplot_titles=("Observed", "Trend (smoothed)", "Seasonal component", "Residual"),
)

fig.add_trace(go.Scatter(x=series.index, y=decomp.observed, mode="lines", name="Observed"), row=1, col=1)
fig.add_trace(go.Scatter(x=series.index, y=decomp.trend, mode="lines", name="Trend"), row=2, col=1)
fig.add_trace(go.Scatter(x=series.index, y=decomp.seasonal, mode="lines", name="Seasonal"), row=3, col=1)
fig.add_trace(go.Scatter(x=series.index, y=decomp.resid, mode="lines", name="Residual"), row=4, col=1)

fig.update_layout(
    title="Seasonal decomposition intuition (additive)",
    template="plotly_white",
    height=900,
    showlegend=False,
)
fig.show()

4) Seasonal differencing (and why it matters)#

Seasonal differencing compares each observation to the one one full season ago:

\[\nabla_s y_t = y_t - y_{t-s}\]

For monthly data with yearly seasonality (s=12), this is literally “this month minus the same month last year”.

A common practical pattern:

  • d=1 removes a slow trend / random walk-like behavior.

  • D=1 removes a seasonal unit root (seasonal random walk-like behavior).

Example (combined differencing with d=1, D=1):

\[w_t = (1-L)(1-L^s)y_t = y_t - y_{t-1} - y_{t-s} + y_{t-s-1}\]
def difference(x: np.ndarray, *, lag: int = 1, order: int = 1) -> np.ndarray:
    """Return the `order`-times differenced series with step `lag`.

    Examples:
    - lag=1, order=1: x_t - x_{t-1}
    - lag=s, order=1: x_t - x_{t-s}
    """
    x = np.asarray(x, dtype=float)
    out = x
    for _ in range(order):
        out = out[lag:] - out[:-lag]
    return out


y = series.values

d1 = difference(y, lag=1, order=1)
Ds1 = difference(y, lag=season_length, order=1)
d1_Ds1 = difference(difference(y, lag=1, order=1), lag=season_length, order=1)

idx_d1 = series.index[1:]
idx_Ds1 = series.index[season_length:]
idx_d1_Ds1 = series.index[season_length + 1 :]

fig = make_subplots(
    rows=4,
    cols=1,
    shared_xaxes=False,
    vertical_spacing=0.04,
    subplot_titles=(
        "Original series",
        "Non-seasonal difference (d=1)",
        f"Seasonal difference (D=1, s={season_length})",
        f"Combined (d=1, D=1, s={season_length})",
    ),
)

fig.add_trace(go.Scatter(x=series.index, y=y, mode="lines", name="y"), row=1, col=1)
fig.add_trace(go.Scatter(x=idx_d1, y=d1, mode="lines", name="d=1"), row=2, col=1)
fig.add_trace(go.Scatter(x=idx_Ds1, y=Ds1, mode="lines", name="D=1"), row=3, col=1)
fig.add_trace(go.Scatter(x=idx_d1_Ds1, y=d1_Ds1, mode="lines", name="d=1,D=1"), row=4, col=1)

fig.update_layout(
    title="Differencing removes trend and/or seasonality",
    template="plotly_white",
    height=900,
    showlegend=False,
)
fig.show()

5) When is SARIMA required?#

You typically need SARIMA (or another seasonal model) when:

  • The series has a stable period s and forecasts must respect that cycle.

  • After handling trend (d) the residuals still show seasonal autocorrelation at lags s, 2s, ....

  • A plain ARIMA forecast systematically misses recurring peaks/troughs.

  • Seasonal differencing (D=1) is needed to make the series look stationary.

If seasonality is deterministic and well-explained by calendar features, SARIMAX (with regressors) or regression + ARMA errors can be a better fit. If seasonality is stochastic (depends on past seasons), SARIMA is often more appropriate.

6) The parameter tuple (p,d,q)(P,D,Q,s)#

  • p: non-seasonal AR order (lags 1..p)

  • d: number of non-seasonal differences

  • q: non-seasonal MA order (lags 1..q)

  • P: seasonal AR order (lags s, 2s, …, Ps)

  • D: number of seasonal differences (lag s)

  • Q: seasonal MA order (lags s, 2s, …, Qs)

  • s: season length

Multiplicative structure means you can get cross-lags. Example:

For SARIMA(1,0,0)(1,0,0,s), the AR polynomial expands as:

\[(1 - \phi_1 L)(1 - \Phi_1 L^s) = 1 - \phi_1 L - \Phi_1 L^s + \phi_1\Phi_1 L^{s+1}\]

So the effective AR part involves lags 1, s, and also s+1.

phi_1 = 0.4
Phi_1 = 0.6

s = season_length
ar_ns = np.array([1.0, -phi_1])
ar_s = np.zeros(s + 1)
ar_s[0] = 1.0
ar_s[s] = -Phi_1

combined = np.convolve(ar_ns, ar_s)
nonzero_lags = [(lag, coef) for lag, coef in enumerate(combined) if abs(coef) > 1e-12]
nonzero_lags
[(0, 1.0), (1, -0.4), (12, -0.6), (13, 0.24)]

7) Low-level NumPy implementation (mechanics)#

Below is a deliberately low-level implementation of SARIMA mechanics:

  • Build the combined AR and MA lag polynomials via convolution.

  • Compute one-step-ahead predictions via an innovations recursion (conditional residuals).

  • Produce multi-step forecasts by setting future innovations to zero (mean forecast).

  • Undo differencing to return forecasts to the original scale.

This is meant for learning (how the model works), not as a drop-in replacement for a production-grade state-space implementation.

def _ar_poly(phi: np.ndarray, Phi: np.ndarray, s: int) -> np.ndarray:
    """Return AR lag polynomial coefficients for phi(L)*Phi(L^s).

    Convention: phi(L) = 1 - phi1 L - ... and Phi(L^s) = 1 - Phi1 L^s - ...
    """
    phi = np.asarray(phi, dtype=float)
    Phi = np.asarray(Phi, dtype=float)

    ar_ns = np.empty(len(phi) + 1)
    ar_ns[0] = 1.0
    ar_ns[1:] = -phi

    ar_s = np.zeros(len(Phi) * s + 1)
    ar_s[0] = 1.0
    for j, Phi_j in enumerate(Phi, start=1):
        ar_s[j * s] = -Phi_j

    return np.convolve(ar_ns, ar_s)


def _ma_poly(theta: np.ndarray, Theta: np.ndarray, s: int) -> np.ndarray:
    """Return MA lag polynomial coefficients for theta(L)*Theta(L^s).

    Convention: theta(L) = 1 + theta1 L + ... and Theta(L^s) = 1 + Theta1 L^s + ...
    """
    theta = np.asarray(theta, dtype=float)
    Theta = np.asarray(Theta, dtype=float)

    ma_ns = np.empty(len(theta) + 1)
    ma_ns[0] = 1.0
    ma_ns[1:] = theta

    ma_s = np.zeros(len(Theta) * s + 1)
    ma_s[0] = 1.0
    for j, Theta_j in enumerate(Theta, start=1):
        ma_s[j * s] = Theta_j

    return np.convolve(ma_ns, ma_s)


def difference_with_forecast_history(y: np.ndarray, *, d: int, D: int, s: int) -> tuple[np.ndarray, list[float], list[np.ndarray]]:
    """Apply differences to y and store the end-of-sample values needed to invert them for forecasting."""
    x = np.asarray(y, dtype=float)

    last_values_nonseasonal: list[float] = []
    for _ in range(d):
        last_values_nonseasonal.append(float(x[-1]))
        x = x[1:] - x[:-1]

    last_values_seasonal: list[np.ndarray] = []
    for _ in range(D):
        last_values_seasonal.append(x[-s:].copy())
        x = x[s:] - x[:-s]

    return x, last_values_nonseasonal, last_values_seasonal


def invert_forecast_differences(
    x_future: np.ndarray,
    *,
    d: int,
    D: int,
    s: int,
    last_values_nonseasonal: list[float],
    last_values_seasonal: list[np.ndarray],
) -> np.ndarray:
    """Invert differencing for future points using stored end-of-sample values."""
    x = np.asarray(x_future, dtype=float)

    # Undo seasonal differencing in reverse order.
    for j in range(D - 1, -1, -1):
        history = list(np.asarray(last_values_seasonal[j], dtype=float))  # length s
        out = np.empty_like(x)
        for i, val in enumerate(x):
            next_val = val + history[-s]
            history.append(next_val)
            out[i] = next_val
        x = out

    # Undo non-seasonal differencing in reverse order.
    for j in range(d - 1, -1, -1):
        running = float(last_values_nonseasonal[j])
        out = np.empty_like(x)
        for i, val in enumerate(x):
            running = running + val
            out[i] = running
        x = out

    return x


def arma_innovations(w: np.ndarray, *, ar_poly: np.ndarray, ma_poly: np.ndarray, constant: float = 0.0) -> tuple[np.ndarray, np.ndarray]:
    """Compute one-step-ahead predictions and innovations (conditional residuals).

    Model form in lag polynomials:
        ar_poly(L) w_t = constant + ma_poly(L) eps_t
    with ar_poly[0]=1 and ma_poly[0]=1.
    """
    w = np.asarray(w, dtype=float)
    K = len(ar_poly) - 1
    M = len(ma_poly) - 1

    w_hat = np.zeros_like(w)
    eps = np.zeros_like(w)

    for t in range(len(w)):
        ar_term = 0.0
        for k in range(1, K + 1):
            if t - k >= 0:
                ar_term += (-ar_poly[k]) * w[t - k]

        ma_term = 0.0
        for k in range(1, M + 1):
            if t - k >= 0:
                ma_term += ma_poly[k] * eps[t - k]

        w_hat[t] = constant + ar_term + ma_term
        eps[t] = w[t] - w_hat[t]

    return w_hat, eps


def arma_forecast(
    w: np.ndarray,
    eps: np.ndarray,
    *,
    ar_poly: np.ndarray,
    ma_poly: np.ndarray,
    steps: int,
    constant: float = 0.0,
) -> np.ndarray:
    """Multi-step mean forecast: future eps are set to 0."""
    K = len(ar_poly) - 1
    M = len(ma_poly) - 1

    w_ext = list(np.asarray(w, dtype=float))
    eps_ext = list(np.asarray(eps, dtype=float))

    out = np.zeros(steps, dtype=float)
    for i in range(steps):
        t = len(w_ext)

        ar_term = 0.0
        for k in range(1, K + 1):
            ar_term += (-ar_poly[k]) * w_ext[t - k]

        ma_term = 0.0
        for k in range(1, M + 1):
            ma_term += ma_poly[k] * eps_ext[t - k]

        w_next = constant + ar_term + ma_term
        out[i] = w_next

        w_ext.append(w_next)
        eps_ext.append(0.0)

    return out
# --- Practical usage: fit SARIMA with statsmodels, then replicate mean forecasts with NumPy ---

train = series.iloc[:-24]
test = series.iloc[-24:]

# A common seasonal choice for monthly data.
order = (1, 1, 1)
seasonal_order = (1, 1, 1, season_length)

model = SARIMAX(
    train,
    order=order,
    seasonal_order=seasonal_order,
    trend="c",  # allow drift (helps when d>0 and the series has trend)
    enforce_stationarity=False,
    enforce_invertibility=False,
)

res = model.fit(disp=False)

steps = len(test)
fc_sm = res.get_forecast(steps=steps)
fc_sm_mean = fc_sm.predicted_mean
fc_sm_ci = fc_sm.conf_int()

# Extract parameters needed for our low-level recursion.
p, d, q = order
P, D, Q, s = seasonal_order
params = res.params

phi = np.array([params.get(f"ar.L{i}", 0.0) for i in range(1, p + 1)], dtype=float)
theta = np.array([params.get(f"ma.L{i}", 0.0) for i in range(1, q + 1)], dtype=float)
Phi = np.array([params.get(f"ar.S.L{(i * s)}", 0.0) for i in range(1, P + 1)], dtype=float)
Theta = np.array([params.get(f"ma.S.L{(i * s)}", 0.0) for i in range(1, Q + 1)], dtype=float)
constant = float(params.get("intercept", params.get("const", 0.0)))

ar_poly = _ar_poly(phi, Phi, s)
ma_poly = _ma_poly(theta, Theta, s)

# Difference training data ourselves and keep the history needed to invert.
w_train, last_non, last_seas = difference_with_forecast_history(train.values, d=d, D=D, s=s)

# Conditional innovations recursion on the differenced series.
w_hat, eps = arma_innovations(w_train, ar_poly=ar_poly, ma_poly=ma_poly, constant=constant)

# Mean forecast in the differenced space.
w_future = arma_forecast(w_train, eps, ar_poly=ar_poly, ma_poly=ma_poly, steps=steps, constant=constant)

# Undo differencing to get forecasts on the original scale.
y_future_numpy = invert_forecast_differences(
    w_future,
    d=d,
    D=D,
    s=s,
    last_values_nonseasonal=last_non,
    last_values_seasonal=last_seas,
)

freq = pd.infer_freq(train.index) or "MS"
forecast_index = pd.date_range(train.index[-1], periods=steps + 1, freq=freq)[1:]
fc_np = pd.Series(y_future_numpy, index=forecast_index, name="forecast_numpy")

res.summary().tables[0]
SARIMAX Results
Dep. Variable: y No. Observations: 96
Model: SARIMAX(1, 1, 1)x(1, 1, 1, 12) Log Likelihood -116.042
Date: Sat, 31 Jan 2026 AIC 244.084
Time: 12:34:56 BIC 257.488
Sample: 01-01-2010 HQIC 249.402
- 12-01-2017
Covariance Type: opg
fig = go.Figure()

fig.add_trace(go.Scatter(x=train.index, y=train.values, mode="lines", name="Train"))
fig.add_trace(go.Scatter(x=test.index, y=test.values, mode="lines", name="Test (truth)", line=dict(color="black")))

# Statsmodels forecast + CI band
fig.add_trace(go.Scatter(x=fc_sm_mean.index, y=fc_sm_mean.values, mode="lines", name="SARIMA forecast (statsmodels)"))

lower = fc_sm_ci.iloc[:, 0].values
upper = fc_sm_ci.iloc[:, 1].values
x_ci = np.concatenate([fc_sm_mean.index.values, fc_sm_mean.index.values[::-1]])
y_ci = np.concatenate([upper, lower[::-1]])
fig.add_trace(
    go.Scatter(
        x=x_ci,
        y=y_ci,
        fill="toself",
        fillcolor="rgba(0, 102, 204, 0.15)",
        line=dict(color="rgba(0,0,0,0)"),
        hoverinfo="skip",
        name="95% CI (statsmodels)",
    )
)

# NumPy recreation of the mean forecast (should be close, not necessarily identical)
fig.add_trace(
    go.Scatter(
        x=fc_np.index,
        y=fc_np.values,
        mode="lines",
        name="Mean forecast (NumPy mechanics)",
        line=dict(dash="dash"),
    )
)

fig.update_layout(
    title="Forecasts across seasons (multi-step ahead)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white",
    height=520,
)
fig.show()

8) Pitfalls + diagnostics#

  • Wrong season length s: if the true period is weekly but you set monthly, the seasonal terms will not help.

  • Over-differencing: too large d/D can introduce unnecessary MA behavior and inflate variance.

  • Deterministic vs stochastic seasonality: if seasonality is explained by calendar dummies, SARIMAX (with regressors) can be simpler.

  • Check residuals: after fitting, residual ACF should not show spikes at s, 2s, .

  • Transformations: multiplicative seasonality is often handled via log(y) then additive SARIMA.

9) Exercises#

  1. Change season_length to 7 and regenerate a synthetic weekly pattern.

  2. Try a simpler model, e.g. SARIMA(1,1,0)(1,1,0,12), and compare forecasts.

  3. Remove the deterministic seasonal term and keep only stochastic seasonal dependence; does D=1 still help?

  4. Add Fourier terms as regressors (SARIMAX) and compare to a seasonal AR/MA specification.

References#

  • Box, Jenkins, Reinsel, Ljung — Time Series Analysis: Forecasting and Control

  • Hyndman & Athanasopoulos — Forecasting: Principles and Practice (seasonal ARIMA chapter)

  • statsmodels SARIMAX documentation